Route DebuggerController operations through a per-controller worker queue by xusheng6 · Pull Request #1090 · Vector35/debugger

xusheng6 · 2026-05-20T20:44:50Z

Replaces the "spawn one detached std::thread per operation" model in DebuggerController with a single per-controller worker thread + FIFO queue, and makes the controller own its state directly instead of bouncing it through the event dispatcher.

Implements #1088 and #1089. Supersedes the auto-closed #1087.

Why

The per-op detach pattern (17 sites) captured this and detached, so a controller could be destroyed while a worker was still running — the recurring use-after-free family in Sentry (#1083, BINARYNINJA-1A/3X/5Z/60/7M). The queue makes lifetime structural: the worker is owned by the controller and joined in the destructor, so no adapter work can outlive it.

What changed

Worker queue. All ops route through Submit / SubmitAndWait. One worker thread per controller, joined on destruction. Submit runs inline if called from the worker itself (so Restart → Quit + Launch doesn't self-deadlock).
Timeout. Every XxxAndWait gains an optional std::chrono::milliseconds timeout (default = wait forever). Existing callers unchanged.
Pause / Restart / Quit / Detach interrupt a running target out-of-band: they call BreakInto() on the caller's thread, then queue the actual op. Without this they'd sit behind the in-flight resume forever.
Adapter-stop channel. AdapterStoppedEvent is now an internal worker signal (condvar), not a public dispatcher event. Conditional-breakpoint silent-resume moved onto the worker; the dispatcher's stop machinery, m_lastAdapterStopEventConsumed, and m_suppressResumeEvent are gone. Adapters are unchanged.
State on the worker. EventHandler (the controller subscribing to its own broadcast) is replaced by ApplyOwnStateForEvent, called inline in PostDebuggerEvent before the event is broadcast. The dispatcher is now pure fan-out to external consumers (UI/plugins). This removes the worker-vs-dispatcher race that broke Restart.
Mutexes removed. m_adapterMutex / m_adapterMutex2 guarded concurrent adapter access that can no longer happen; replaced with an assert(on worker thread).
Stop is always reported. The worker op calls NotifyStopped even on a user-requested break (the old Pause path used to do it; the new out-of-band Pause doesn't), fixing "target halts but UI still shows running".

API compatibility

Public method signatures unchanged (timeout is an appended optional arg). Callers in core/ffi.cpp and ui/uinotification.cpp need no changes.

Out of scope (follow-ups)

Adapter-internal threads (EngineLoop, LLDB listener, RSP loops) — need adapter dtors as join barriers.
m_targetControlMutex is now largely redundant; kept to limit churn.
Refcounted DebugAdapter was tried and reverted — the adapter has no independent lifecycle (freed only in ~DebuggerState), so it solved no real UAF.
UI repaint throttling for fast stepping — UI widgets: mark-dirty-on-change, repaint at most every 100ms #1093.

Test plan

Builds clean (ninja — core + ui)
Launch / step / breakpoint / pause while running / detach mid-step / close tab mid-op
Restart on a running target
si typed directly in the LLDB REPL while idle (spontaneous stop)
Conditional breakpoint with a false condition (silent resume)
IL step surfaces one stop per user action
test/debugger_test.py

bdash · 2026-05-26T15:41:38Z

+	// pair (which guarded against concurrent adapter access from multiple spawned threads)
+	// is no longer needed. The new Pause path bypasses this method entirely and calls
+	// m_adapter->BreakInto() out-of-band; everything else funnels through here on the worker.
+	assert(std::this_thread::get_id() == m_workerThreadId);


I'd recommend using BN_ASSERT or BN_RELEASE_ASSERT, depending on whether we want to validate this only in debug builds or also in our release builds. I'd lean towards the latter given that this check is cheap.

bdash · 2026-05-26T17:27:18Z

 DebuggerController::~DebuggerController()
 {
+	{
+		std::lock_guard<std::mutex> lock(m_workQueueMutex);


I think there's an issue with the way mutexes and condition variables are used here that can lead to missed wakeups. Specifically, DebuggerController::WaitForAdapterStop uses m_adapterStopMutex with its condition variable, while this store occurs with m_workQueueMutex held. This means there's a race that can happen where m_workerShouldExit is set here but DebuggerController::WaitForAdapterStop does not see it.

This is related to the note at https://en.cppreference.com/cpp/thread/condition_variable that says:

Even if the shared variable is atomic, it must be modified while owning the mutex to correctly publish the modification to the waiting thread.

I changed it to

{ std::scoped_lock shutdownLock(m_workQueueMutex, m_adapterStopMutex); m_workerShouldExit = true; }

which I believe should fix the issue

bdash · 2026-05-26T17:31:11Z

+	// (GoAndWaitOnWorker etc.) is the sole notifier; it calls NotifyStopped on every
+	// genuine stop regardless of m_userRequestedBreak. m_userRequestedBreak only
+	// suppresses conditional-breakpoint auto-resume in ShouldSilentResumeAfterStop.
+	m_userRequestedBreak = true;


Is it safe to modify m_userRequestedBreak without taking any locks? It appears to potentially be read / written from different threads, so either needs to always be accessed with a lock held or to be std::atomic<bool>.

this boolean is a leftover from the old code. It is no longer needed in the new design, I will delete it

bdash · 2026-05-26T17:34:08Z


 	private:
+		// m_adapter is the active debug adapter for this controller. Written exclusively
+		// from the worker thread (in CreateDebugAdapter); read both from the worker


Since this field is accessed by multiple threads, don't accesses need to be guarded by a mutex to avoid being data races?

yeah data races are not the focus of this PR, they will need a separate one to address

bdash · 2026-05-26T17:42:41Z


+void DebuggerController::WorkerThreadMain()
+{
+	m_workerThreadId = std::this_thread::get_id();


This is set on the worker thread but read from other threads without any synchronization, which is a data race. This needs some form of synchronization.

Another option would be to skip storing a thread ID, and instead use a thread-local variable that points to the DebuggerController instance when on the worker thread and is nullptr otherwise. To check whether code is running on the worker thread you'd then instead test t_controllerOnWorker == this. Since it is a thread-local variable there is no need for synchronization.

I adopted the suggested method

bdash · 2026-05-26T18:00:58Z

+// caches updated via UpdateCaches) are mutated under the worker's serialization
+// for ops the worker generates (TargetStopped, LaunchFailure, etc.) and under
+// the adapter event thread for spontaneous events (TargetExited, RegisterChanged,
+// ActiveThreadChanged). Concurrent mutation is unlikely in practice because the


I'm concerned by this reasoning. It seems to say that this code contains data races, but it's fine because they probably don't happen very often.

A data race is when two or more threads access the same memory location, with at least one of those accesses being a write, and there is no synchronization between the accesses (synchronization here is defined in terms of the happens-before relationship).

In C++, if a data race occurs, the behavior of the program is undefined. The compiler is free to assume that data races do not happen and optimize the code on that basis. This makes it impossible to reason about what the code is actually doing.

bdash · 2026-05-26T18:07:57Z

+		// in-flight op to settle before returning). A timeout of milliseconds::max() means
+		// "wait forever" and bypasses wait_for entirely (avoids overflow inside the stdlib).
+		template<typename F>
+		auto SubmitAndWait(F&& f, std::chrono::milliseconds timeout)


What behavior does this have if run on the worker thread? Submit has an explicit case for it. Does this need one? If it's never expected to be called from the worker thread, an assert would serve to document that and catch any potential future misuse.

After looking at the code closer, I have decided to make both of them NOT re-entrant. There is also no place where they are called in a reentrant way.

Just for the record -- Submit can be made re-entrant safe, and SubmitAndWait, I believe, cannot

bdash · 2026-05-26T18:08:31Z

+		DebugStopReason RestartAndWaitOnWorker();
+		void DetachAndWaitOnWorker();
+		void QuitAndWaitOnWorker();
+		DebugStopReason PauseAndWaitOnWorker();


PauseAndWaitOnWorker appears to not be referenced anywhere.

xusheng6 · 2026-05-27T15:43:36Z

Hi @bdash thanks for the review! I forgot to mention that this PR mainly means to be clean up the lifetime + control-flow issues, so that I can deal with the data races in a follow-up PR. I did a bad job in communicating this, and I guess you could not have known that because the last commit I added did appear to trying to handle the data races. I know that one is not sufficient yet -- I was just testing the general visibility of the idea, and data races will be addressed in a future PR

I will go ahead and address the new issues you mention that are introduced by the new code, and have you look at it again

bdash · 2026-05-29T19:56:41Z

+// Thread-safety: this body touches m_state's connection/execution status, plus
+// m_lastIP / m_currentIP / m_exitCode on the controller -- all plain (non-atomic,
+// unlocked) fields read from other threads (UI render layer, file accessor) without
+// synchronization. Those are pre-existing data races reported in #1091; this PR


The remainder of this comment is in terms of "this PR" and "this commit", which won't make much sense once the change is merged.

I'd probably just end this with "Fixing these data races is tracked in #1091".

xusheng6 force-pushed the test_refactor-worker-queue branch from d4973e0 to 3a44260 Compare May 21, 2026 19:58

This was referenced May 21, 2026

Refcount DebugAdapter to enforce cross-thread lifetime guarantees #1092

Closed

UI widgets: mark-dirty-on-change, repaint at most every 100ms #1093

Open

bdash reviewed May 26, 2026

View reviewed changes

bdash approved these changes May 29, 2026

View reviewed changes

Refactor debugger execution and event model

4ff2922

xusheng6 force-pushed the test_refactor-worker-queue branch from 4d3a9ae to 4ff2922 Compare June 16, 2026 21:05

xusheng6 merged commit 723b403 into dev Jun 16, 2026
1 check passed

xusheng6 deleted the test_refactor-worker-queue branch June 16, 2026 21:08

Conversation

xusheng6 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

API compatibility

Out of scope (follow-ups)

Test plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xusheng6 commented May 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xusheng6 commented May 20, 2026 •

edited

Loading